[00:00.820 --> 00:03.820]  Hi everyone, welcome to our talk, Hacking the Supply Chain,
[00:03.820 --> 00:08.320]  about the Ripple 20 vulnerabilities that haunt hundreds of millions of critical devices.
[00:08.440 --> 00:10.720]  A little bit about us, JSOF.
[00:10.720 --> 00:12.880]  So JSOF is a software security consultancy,
[00:12.880 --> 00:16.220]  we do a lot of security research, penetration testing,
[00:16.220 --> 00:19.140]  we help companies with their secure development processes,
[00:19.140 --> 00:20.640]  as well as some training.
[00:21.220 --> 00:24.200]  My name is Shlomi Overman, I'm a co-founder at JSOF.
[00:24.200 --> 00:26.580]  Together with me today, we'll be speaking Moshe Kohl,
[00:26.780 --> 00:28.160]  a security researcher at JSOF,
[00:28.160 --> 00:30.580]  and the finder of the Ripple 20 vulnerabilities.
[00:30.720 --> 00:33.940]  And Ariel Schon, a security researcher at JSOF,
[00:33.940 --> 00:36.580]  that was also heavily involved in the research.
[00:37.100 --> 00:39.760]  We'll be talking about Ripple 20 in general today,
[00:39.760 --> 00:42.640]  explaining what it is and how it evolved.
[00:42.640 --> 00:45.480]  Then we'll be going into detail about one of the vulnerabilities,
[00:45.480 --> 00:48.260]  the vulnerability that we find the most interesting
[00:48.260 --> 00:50.600]  out of the Ripple 20 vulnerabilities.
[00:50.740 --> 00:52.920]  Short spoiler here, it's one CVE,
[00:52.920 --> 00:55.120]  but we'll actually be talking about several vulnerabilities
[00:55.120 --> 00:58.420]  hiding behind this one CVE.
[00:58.560 --> 01:00.820]  And then we'll be going in-depth into exploitation
[01:00.820 --> 01:03.060]  on a specific vulnerable device,
[01:03.060 --> 01:05.580]  with a specific configuration.
[01:06.220 --> 01:08.220]  So, what is Ripple 20?
[01:08.220 --> 01:10.980]  Ripple 20 is a series of 19 zero-day vulnerabilities
[01:10.980 --> 01:14.340]  in a TCPIP stack called TREC-TCPIP.
[01:14.440 --> 01:16.740]  We say 19 zero-day vulnerabilities,
[01:16.740 --> 01:18.780]  but it sort of depends how you count.
[01:18.780 --> 01:21.820]  There are quite a few more discrete bugs
[01:21.820 --> 01:24.060]  in these 19 vulnerabilities,
[01:24.060 --> 01:26.420]  and two of the vulnerabilities were reported anonymously
[01:26.420 --> 01:28.720]  at the same time as we reported,
[01:28.720 --> 01:30.320]  so it depends how you count.
[01:30.320 --> 01:32.820]  These vulnerabilities were amplified by the supply chain
[01:32.820 --> 01:34.680]  to affect hundreds of millions of devices
[01:34.680 --> 01:37.200]  in all kinds of verticals.
[01:37.280 --> 01:39.980]  Medical, a lot of critical devices,
[01:39.980 --> 01:43.280]  industrial control, enterprise, and a bunch of others.
[01:43.280 --> 01:45.500]  We'll explain how this came to happen.
[01:46.180 --> 01:48.460]  Out of the Ripple 20 vulnerabilities,
[01:48.460 --> 01:52.280]  four of the vulnerabilities are critical remote code execution vulnerabilities,
[01:52.280 --> 01:56.220]  and in addition, eight vulnerabilities, or eight CVEs,
[01:56.220 --> 01:58.460]  are medium to high-severity vulnerabilities,
[01:58.460 --> 02:02.520]  some of which could also potentially lead to remote code execution,
[02:02.520 --> 02:07.500]  pending further research, if anyone wants to go into that.
[02:08.560 --> 02:11.760]  The vulnerabilities affect hundreds of different sub-medium devices.
[02:11.760 --> 02:14.340]  From brand names, you all know.
[02:14.820 --> 02:18.940]  We currently have over 100 different vendors
[02:18.940 --> 02:21.360]  with suspected affected devices,
[02:21.360 --> 02:24.680]  meaning different vendors are looking into these devices
[02:24.680 --> 02:27.100]  to see whether they're affected or not.
[02:27.100 --> 02:30.620]  These include Fortune 500, Global 500 companies,
[02:30.620 --> 02:32.980]  such as those you see on the slide,
[02:32.980 --> 02:36.300]  as well as one-person specialty shops,
[02:36.300 --> 02:41.220]  and quite a large range and diverse range of vendors.
[02:42.240 --> 02:45.860]  You can see the Ripple 20 vulnerabilities in medical devices
[02:45.860 --> 02:50.340]  when you go to the hospital, in your home, in your company,
[02:50.340 --> 02:53.540]  when you turn on the light, when you turn on the water.
[02:54.060 --> 02:56.740]  Transportation devices are affected by Ripple 20.
[02:56.740 --> 03:01.320]  Pretty much everywhere you go, you'll see IoT devices in general,
[03:01.320 --> 03:05.060]  as well as specifically devices affected by the Ripple 20 vulnerabilities.
[03:06.160 --> 03:08.760]  Our current assumption, based on what we know,
[03:08.760 --> 03:11.800]  the suspected devices, as well as the confirmed devices,
[03:11.800 --> 03:16.120]  meaning the different vendors confirmed the devices are affected,
[03:16.120 --> 03:19.340]  we assume every medium to large organization in the US
[03:19.340 --> 03:21.880]  has at least one vulnerable device.
[03:21.880 --> 03:25.640]  Of course, some networks have many more devices,
[03:25.640 --> 03:28.260]  such as data centers or utility companies,
[03:28.260 --> 03:33.760]  but even any other kind of organization might have an affected printer,
[03:34.260 --> 03:37.760]  a hospital might have affected IV pumps, etc.
[03:38.060 --> 03:39.700]  How did this come to happen?
[03:39.700 --> 03:41.520]  So this happened because of the supply chain
[03:42.140 --> 03:44.000]  and security in the supply chain.
[03:44.000 --> 03:45.980]  Not in the sense that somebody put a backdoor
[03:45.980 --> 03:49.580]  into a component built in another country,
[03:49.580 --> 03:52.740]  but in the sense that a piece of code, a library,
[03:52.740 --> 03:55.660]  had a vulnerability, like any software has a vulnerability.
[03:55.720 --> 03:58.020]  And this piece of code was sold to an operating system.
[03:58.020 --> 04:01.360]  It was then sold onwards to a system on module,
[04:01.360 --> 04:03.600]  and the system on module embedded the operating system
[04:03.600 --> 04:06.200]  with our vulnerable library.
[04:06.260 --> 04:10.180]  And from this point, nobody even knows they're using this piece of code,
[04:10.180 --> 04:12.360]  this piece of TREC TCP IP stack.
[04:12.360 --> 04:14.920]  This system on module is then used in different devices,
[04:14.920 --> 04:19.520]  such as IV pumps, and the whole chain becomes vulnerable.
[04:19.520 --> 04:22.600]  Just imagine what happens if one of the companies along the way
[04:22.600 --> 04:25.240]  goes bankrupt or ceases operation,
[04:25.240 --> 04:28.580]  just how difficult it is to track down the vulnerable devices
[04:28.580 --> 04:32.820]  and how complex it is to fix and patch.
[04:33.120 --> 04:36.380]  And so you have a network of devices,
[04:36.520 --> 04:37.820]  a network of different vendors,
[04:37.820 --> 04:40.500]  each selling to each other to create final products,
[04:40.500 --> 04:44.400]  built like Lego from different pieces and different parts.
[04:44.400 --> 04:48.700]  And then when one part at the very beginning of the supply chain,
[04:48.700 --> 04:51.900]  strategically located at the beginning of the supply chain,
[04:51.900 --> 04:53.680]  gets affected by a vulnerability,
[04:53.680 --> 04:56.500]  different devices along the supply chain get affected.
[04:56.680 --> 04:58.500]  This is what happened with Ripple20,
[04:58.500 --> 05:00.760]  and this is why it affects so many devices.
[05:02.080 --> 05:06.940]  So, why did we choose to do this research on TREC TCP IP,
[05:06.940 --> 05:09.840]  and why do we think it's important for the security industry as a whole?
[05:10.120 --> 05:13.620]  Well, for starters, this aspect of the supply chain,
[05:13.620 --> 05:16.800]  of vulnerabilities existing in the supply chain,
[05:16.800 --> 05:19.520]  not as backdoors, but just as regular vulnerabilities,
[05:19.520 --> 05:24.740]  and traveling, rippling from device to device in this ripple effect,
[05:24.740 --> 05:26.660]  this is something that is mostly unexplored.
[05:26.660 --> 05:29.780]  It's been discussed, we've seen some examples,
[05:29.780 --> 05:32.460]  but Ripple20 is really a prime example of what happens
[05:33.260 --> 05:36.280]  when a vulnerability is located this deep in the supply chain
[05:36.280 --> 05:38.280]  and exists for so many years.
[05:38.280 --> 05:41.580]  So, there's one vulnerability, multiple products,
[05:41.580 --> 05:46.640]  huge impact, and this is quite the beginning of a discussion
[05:46.640 --> 05:50.140]  of what should be done to fix these types of vulnerabilities.
[05:50.920 --> 05:55.480]  Another interesting thing is, for us, it was, of course, a good attack surface,
[05:55.480 --> 06:00.000]  and going forward, there's the potential for zombie vulnerabilities.
[06:00.060 --> 06:04.240]  Zombie vulnerabilities, these type of, it's complicated vulnerabilities,
[06:04.240 --> 06:06.720]  we don't know if they're one days, we don't know if they're zero days,
[06:06.720 --> 06:09.160]  quite a few of the vendors we believe won't be patching.
[06:10.840 --> 06:14.700]  Either they don't know they have TREC, the devices can't be patched,
[06:14.700 --> 06:18.500]  companies went bankrupt, and so you have these walking dead vulnerabilities
[06:18.500 --> 06:22.360]  that are supposed to be patched, they've already been reported,
[06:22.360 --> 06:24.580]  but they still exist in the wild.
[06:24.580 --> 06:27.800]  So, this is quite an interesting phenomenon, both for us,
[06:27.800 --> 06:29.300]  as well as for the industry.
[06:30.260 --> 06:36.120]  TREC specifically was chosen because it's an extremely successful TCP IP stack,
[06:36.120 --> 06:41.400]  used widely in the embedded device world, in the IoT world,
[06:41.400 --> 06:43.020]  and it's been available for over 20 years.
[06:43.020 --> 06:45.920]  So, during those 20 years, or 20-something years,
[06:45.920 --> 06:49.420]  it spread throughout the supply chain from device to device,
[06:49.420 --> 06:54.620]  from vendor to vendor, and reached its current scope, current impact.
[06:54.740 --> 06:58.820]  One thing that's very interesting about TREC TCP IP,
[06:58.820 --> 07:01.440]  made our life quite challenging,
[07:01.440 --> 07:04.100]  and made the lives of network operators challenging,
[07:04.100 --> 07:06.880]  is that the TREC stack is extremely configurable.
[07:07.140 --> 07:09.760]  So, every instance of TREC is slightly different.
[07:09.760 --> 07:13.280]  It depends when the vendor stopped support of TREC,
[07:13.280 --> 07:14.660]  what version of TREC they're using.
[07:14.660 --> 07:16.760]  It also depends how they compiled TREC.
[07:18.040 --> 07:22.260]  Specific configuration options, specific features they bought or did not buy,
[07:22.260 --> 07:25.280]  how they're using, what memory manager they're using, etc.
[07:25.540 --> 07:29.940]  And so, every version of the TREC stack looks different,
[07:29.940 --> 07:31.800]  and the vulnerabilities change.
[07:31.800 --> 07:34.640]  So, some vulnerabilities affect different devices differently,
[07:34.640 --> 07:37.020]  some devices are affected by some of the vulnerabilities,
[07:37.020 --> 07:41.940]  some devices are affected by more extreme versions of the vulnerabilities,
[07:41.940 --> 07:44.660]  while the others are affected by lesser versions.
[07:44.660 --> 07:48.160]  So, very complex, very complex to understand.
[07:48.180 --> 07:52.240]  And, of course, at the very, very beginning of a long supply chain.
[07:52.240 --> 07:54.740]  So, TREC doesn't incorporate any other components,
[07:54.740 --> 07:58.700]  which is not something that we can say about any of the other devices affected,
[07:58.700 --> 08:01.640]  which all incorporate different components, including TREC.
[08:02.640 --> 08:05.860]  The research itself, because TREC is so configurable,
[08:05.860 --> 08:08.260]  we had to take a few data points.
[08:08.260 --> 08:09.980]  We used six different devices.
[08:09.980 --> 08:14.720]  We spent a different amount of time on each device
[08:14.720 --> 08:17.400]  in order to understand what we're dealing with,
[08:17.400 --> 08:19.820]  in order to find variations of the vulnerabilities,
[08:20.400 --> 08:24.500]  in order to understand whether certain vendors made changes of their own
[08:24.500 --> 08:26.980]  to the stack or to the compilation.
[08:26.980 --> 08:30.940]  And so we used six different devices with differing architectures
[08:31.700 --> 08:35.460]  in order to understand what's going on,
[08:35.460 --> 08:39.520]  where this stack is, how this stack is used.
[08:39.520 --> 08:42.260]  The research took approximately nine months,
[08:42.260 --> 08:45.100]  with changing, differing intensities throughout this time.
[08:45.100 --> 08:49.260]  Started September 19th, we reported the vulnerabilities June 20th,
[08:49.580 --> 08:51.740]  June 2020, of course.
[08:52.360 --> 08:56.320]  And then we discovered some things after the fact as well.
[08:56.320 --> 09:01.060]  For example, while trying to bypass our company firewall
[09:01.060 --> 09:04.420]  in order to see if the vulnerabilities route over the Internet,
[09:04.420 --> 09:10.100]  we discovered that because TREC TCPIP stack allows for encapsulation,
[09:10.100 --> 09:13.200]  you could also encapsulate these vulnerabilities
[09:13.200 --> 09:15.280]  attempting to bypass firewalls.
[09:15.280 --> 09:18.100]  So we were able to bypass our firewalls,
[09:18.100 --> 09:23.220]  and at least some of these vulnerabilities will route over the Internet.
[09:24.180 --> 09:27.800]  As with any research involved in IoT,
[09:27.800 --> 09:29.540]  especially with so many different devices,
[09:29.540 --> 09:31.360]  we saw some pretty strange architectures,
[09:31.360 --> 09:36.700]  pretty strange firmwares and firmware structures.
[09:37.100 --> 09:42.240]  The device we'll be talking about today has one of these unique architectures.
[09:42.240 --> 09:45.400]  We can't talk about everything in the scope of 45 minutes.
[09:45.400 --> 09:50.820]  Of course, these are very complex, detailed things.
[09:50.820 --> 09:53.820]  The exploitation of all the vulnerabilities is something
[09:53.820 --> 09:56.120]  that we won't be able to go into in depth.
[09:56.540 --> 10:00.340]  We did release two white papers with full details,
[10:00.340 --> 10:04.080]  as well as pseudocode and specific references
[10:04.080 --> 10:08.680]  to the devices and the memory configurations that we exploited.
[10:08.700 --> 10:10.820]  Feel free to look them up on our site
[10:10.820 --> 10:13.320]  if you want to go into further technical detail.
[10:14.440 --> 10:16.860]  A little bit about the vulnerability, or the CV,
[10:16.860 --> 10:18.440]  that we'll be talking about today.
[10:18.440 --> 10:22.580]  CV-2020-11901 is a critical vulnerability
[10:22.580 --> 10:25.160]  in the TREC DNS resolver component,
[10:25.160 --> 10:27.280]  so client-side DNS vulnerability.
[10:27.280 --> 10:30.920]  Once successfully exploited, it allows remote code execution,
[10:31.440 --> 10:34.320]  as we've demonstrated and as you'll see today.
[10:34.320 --> 10:35.780]  The reason we think it's so interesting
[10:35.780 --> 10:37.700]  and the reason we chose to talk about it,
[10:37.700 --> 10:39.980]  besides being remote code execution,
[10:39.980 --> 10:42.780]  is the fact that DNS traverses NAT boundaries.
[10:43.160 --> 10:47.760]  A device within your network will issue a DNS request
[10:47.760 --> 10:50.440]  which will potentially travel over the internet,
[10:50.440 --> 10:52.300]  allowing sophisticated high-end attackers
[10:52.300 --> 10:55.540]  to perform an attack from outside the network
[10:55.540 --> 10:57.740]  into the network itself,
[10:57.740 --> 11:01.560]  which is different from most vulnerabilities we see in TCP-IP stacks.
[11:01.560 --> 11:04.440]  Behind the scenes, four vulnerabilities.
[11:04.460 --> 11:06.200]  One, what we call an artifact.
[11:06.200 --> 11:08.920]  That's a bug that made it easier for us to exploit.
[11:09.080 --> 11:12.600]  And these vulnerabilities vary over time.
[11:12.600 --> 11:17.020]  So, along the years, TREC changed the stack
[11:17.020 --> 11:18.280]  and the vulnerabilities changed,
[11:18.280 --> 11:19.960]  so we see different versions of the vulnerabilities
[11:19.960 --> 11:23.460]  in different devices, depending on when they stopped using TREC.
[11:23.460 --> 11:26.520]  They also change between vendor configuration.
[11:26.680 --> 11:28.660]  One of the interesting things that we realized
[11:28.660 --> 11:34.640]  during this research, as well as during the disclosure process,
[11:34.640 --> 11:37.420]  we realized that the supply chain complexities
[11:37.420 --> 11:39.680]  are a real problem for information security,
[11:39.680 --> 11:40.900]  they're a real problem for the vendors,
[11:40.900 --> 11:42.920]  they're a real problem for the network operators,
[11:42.920 --> 11:45.980]  and they create a challenge for the whole industry.
[11:45.980 --> 11:49.580]  And we realized this is bound to happen again.
[11:49.580 --> 11:51.680]  We didn't research the whole TCP-IP stacks,
[11:51.680 --> 11:53.000]  there are other TCP-IP stacks,
[11:53.000 --> 11:56.200]  there are other pieces of code, libraries,
[11:56.200 --> 11:57.680]  that exist in the supply chains
[11:57.680 --> 11:59.920]  that have been lurking around for years.
[11:59.920 --> 12:03.220]  And so, this would also be the beginning of a conversation
[12:03.220 --> 12:06.060]  about what the vendors and what the network operators can do
[12:06.060 --> 12:10.420]  to both reduce the impact of vulnerabilities like this
[12:10.420 --> 12:13.800]  when they happen, using things like exploit mitigations,
[12:13.800 --> 12:18.180]  as well as how we perform such complex disclosures
[12:18.180 --> 12:21.120]  and how we make sure that, as vendors,
[12:21.120 --> 12:23.840]  how the vendors make sure that their suppliers
[12:23.840 --> 12:26.600]  are performing secure development,
[12:26.600 --> 12:28.940]  pen testing their code, etc.
[12:28.940 --> 12:31.620]  So, this is sort of a wake-up call to some of the vendors
[12:31.620 --> 12:32.840]  and some of the network operators
[12:33.360 --> 12:37.000]  that this will happen probably again and again,
[12:37.000 --> 12:38.680]  and this can be prevented,
[12:38.680 --> 12:40.340]  or at least the impact can be reduced,
[12:40.340 --> 12:42.380]  using the right techniques.
[12:43.200 --> 12:46.280]  Now, I'm going to hand over the microphone to Moshe Kol.
[12:46.280 --> 12:48.800]  We'll be talking about the different vulnerabilities,
[12:48.800 --> 12:51.860]  where they are in the code, and what they look like.
[12:52.480 --> 12:53.720]  Thanks, Shlomi.
[12:53.740 --> 12:58.220]  Hi, I'm Moshe. I'm a security researcher at JSOF,
[12:58.220 --> 13:00.900]  and I will talk about the vulnerabilities
[13:00.900 --> 13:03.780]  that comprise CVE-2020-11901,
[13:03.780 --> 13:05.620]  also known as the DNS bugs.
[13:06.240 --> 13:07.960]  So, first, we need to refresh our memory
[13:07.960 --> 13:10.060]  about the DNS protocol.
[13:10.060 --> 13:12.160]  So, DNS is a core internet protocol
[13:13.000 --> 13:16.480]  designed to map between domain names and IP addresses.
[13:16.620 --> 13:18.160]  It's a query-response protocol,
[13:18.160 --> 13:20.420]  client-server architecture.
[13:20.560 --> 13:22.180]  So, the client resolves the name
[13:22.180 --> 13:24.700]  by issuing a query to a DNS server.
[13:25.080 --> 13:29.900]  So, for example, if you browse to www.example.com,
[13:29.900 --> 13:33.240]  your browser issues a DNS query of type A
[13:33.240 --> 13:36.300]  to one of the configured DNS servers,
[13:36.300 --> 13:38.040]  and the servers look up the name
[13:38.040 --> 13:39.820]  and returns a response.
[13:39.820 --> 13:41.500]  In this case of type A,
[13:41.500 --> 13:45.300]  the value of the response is an IPv4 address.
[13:45.580 --> 13:48.080]  So, a little bit about the record types.
[13:48.080 --> 13:50.960]  So, DNS servers can return multiple answers
[13:50.960 --> 13:53.060]  in the same DNS response.
[13:53.060 --> 13:55.920]  An answer is specified as a resource record.
[13:55.920 --> 13:58.600]  Each resource record is associated with a name.
[13:58.600 --> 14:01.700]  We'll talk about the type and class fields shortly.
[14:02.020 --> 14:05.500]  The TTL field specifies the number of seconds.
[14:05.500 --> 14:07.380]  This record is valid.
[14:07.380 --> 14:11.920]  The value of the record is specified in the rdata field,
[14:11.920 --> 14:15.540]  whose length is specified in the rdlength field.
[14:15.660 --> 14:20.680]  So, both questions and answers have a type in the DNS protocol.
[14:20.940 --> 14:23.580]  Some of the common types include type A,
[14:23.580 --> 14:26.780]  which specifies an IPv4 address for the query domain.
[14:26.780 --> 14:29.040]  Type CNAME, which defines an alias,
[14:29.220 --> 14:32.320]  a canonical name for the query domain.
[14:32.400 --> 14:35.160]  This provides a level of indirection.
[14:35.160 --> 14:38.640]  And type MX, which defines the domain name
[14:38.640 --> 14:41.180]  of a mail server for the query domain.
[14:41.180 --> 14:45.520]  So, if, for example, you send an email for gmail.com,
[14:45.520 --> 14:50.300]  your mail client issues a DNS query for gmail.com,
[14:50.300 --> 14:54.400]  it gets back a domain name of a mail server for gmail.com.
[14:54.400 --> 14:56.640]  And because this is a name and not an IP address,
[14:56.640 --> 15:00.600]  this name needs to be resolved further into an IP address.
[15:00.600 --> 15:05.220]  So, the resolver issues a DNS query of type A in this case,
[15:05.220 --> 15:08.620]  in order to resolve the domain name of a mail server.
[15:08.840 --> 15:13.000]  In practice, most DNS servers simply hand in the IP address
[15:13.000 --> 15:15.100]  along with the name, but nonetheless,
[15:15.100 --> 15:19.340]  this functionality needs to be supported by the DNS resolvers.
[15:20.180 --> 15:24.040]  So, a little bit about the domain names encoding,
[15:24.040 --> 15:25.860]  the binary format.
[15:26.200 --> 15:30.020]  So, domain names are encoded as a sequence of labels.
[15:30.020 --> 15:34.240]  So, www is a label, example is a label.
[15:34.280 --> 15:37.080]  Each label is preceded by a length byte,
[15:37.080 --> 15:40.680]  specifying the number of characters this label occupies.
[15:40.680 --> 15:44.480]  And the domain name is terminated by a zero length byte.
[15:44.600 --> 15:49.460]  And according to the RFC, the maximum label length is 63.
[15:49.460 --> 15:51.020]  This will come up later.
[15:51.520 --> 15:55.200]  So, what the designers of the DNS protocols notice
[15:55.200 --> 16:00.680]  is that there is a lot of repetition inside the DNS packet itself.
[16:00.920 --> 16:05.040]  So, in an effort to reduce the size of the DNS messages,
[16:05.040 --> 16:07.380]  they employ a simple compression scheme.
[16:07.380 --> 16:09.240]  In this scheme, compression is achieved
[16:09.240 --> 16:12.000]  by replacing a sequence of labels with a pointer
[16:12.000 --> 16:14.800]  to prior occurrence of the same sequence.
[16:14.800 --> 16:17.580]  So, here you can see a sample DNS response packet.
[16:17.580 --> 16:21.200]  You can see gmail.com is specified as an offset 0xc
[16:21.720 --> 16:23.880]  from the start of the packet.
[16:23.880 --> 16:26.680]  And it so happens that gmail.com needs to be specified
[16:26.680 --> 16:30.000]  multiple times inside the packet.
[16:30.000 --> 16:33.240]  So, instead of specifying gmail.com literally in the packet,
[16:33.240 --> 16:36.960]  again, we simply use the compression feature.
[16:36.960 --> 16:42.280]  We point to the previous occurrence of gmail.com.
[16:42.280 --> 16:46.140]  So, if we want to write smtp.gmail.com,
[16:46.140 --> 16:50.200]  we need to write only the first label literally,
[16:50.200 --> 16:51.440]  smtp in this case,
[16:51.440 --> 16:53.120]  and the next two labels are specified
[16:53.120 --> 16:55.280]  using the compression offset.
[16:55.820 --> 16:58.060]  So, in the compression schemes,
[16:58.060 --> 17:00.620]  compression pointers are encoded in two bytes.
[17:00.620 --> 17:04.600]  The first byte begins with 11 as the most significant bit,
[17:04.600 --> 17:07.140]  and the other 14 bits specify an offset
[17:07.140 --> 17:08.900]  from the start of the header.
[17:10.320 --> 17:13.200]  So, as Slomi said, the vulnerabilities reside
[17:13.200 --> 17:16.740]  in the DNS resolver of TrackTCPAP,
[17:16.740 --> 17:20.860]  and we find them in the DNS parsing logic in the stack,
[17:20.860 --> 17:23.700]  specifically in a function called tf.dns.callback.
[17:23.700 --> 17:28.100]  Here you can see a snippet of pseudocode from this function,
[17:28.100 --> 17:33.440]  specifically the parsing logic handling MX resource records.
[17:33.440 --> 17:37.080]  So, what we can see here is that the function computes
[17:37.520 --> 17:39.880]  the length of the MX hostname.
[17:39.880 --> 17:42.960]  Based on that length, a buffer on the heap is allocated,
[17:42.960 --> 17:46.460]  and then the MX hostname is copied as ASCII
[17:46.460 --> 17:49.920]  into the just-allocated buffer.
[17:49.920 --> 17:52.980]  So, what we can see from this snippet is that
[17:52.980 --> 17:55.800]  tf.dns.label.to.ascii, the function that is responsible
[17:55.800 --> 17:58.920]  for the copy, is not aware of the length
[17:58.920 --> 18:00.880]  of the buffer being allocated.
[18:00.940 --> 18:04.640]  What it does is simply copies bytes from the encoded name
[18:04.640 --> 18:06.560]  until a null byte is reached.
[18:06.800 --> 18:09.720]  So, this means that if, for some reason,
[18:09.720 --> 18:13.920]  expand.label.length returns a length value which is too small,
[18:13.920 --> 18:17.160]  we will have a heap-based buffer overflow vulnerability.
[18:17.160 --> 18:20.400]  So, this motivates us to look further into expand.label.length
[18:20.400 --> 18:22.380]  and examine its operation.
[18:22.760 --> 18:25.680]  So, you can see a pseudocode of expand.label.length.
[18:25.680 --> 18:28.900]  What this function is basically doing is sums up all the length bytes
[18:28.900 --> 18:30.880]  while honoring compression.
[18:30.880 --> 18:35.120]  So, in more detail, it reads the current label length,
[18:35.120 --> 18:37.660]  then checks to see if there is compression or not.
[18:37.660 --> 18:40.740]  If there is no compression, which is the common case,
[18:40.740 --> 18:44.420]  it adds the current label length plus one to the total length variable,
[18:44.420 --> 18:47.980]  which is later returned from this function, and advance the input add.
[18:48.320 --> 18:51.660]  And if there is compression, it reads the compression offset,
[18:51.660 --> 18:54.960]  it computes a new label pointer based on that offset,
[18:54.960 --> 18:57.300]  and then it checks to see that the new label pointer
[18:57.300 --> 19:00.200]  points before the initial label pointer.
[19:00.200 --> 19:02.020]  So, this means we can only jump backwards
[19:02.660 --> 19:05.760]  from where we were, and the process
[19:05.760 --> 19:09.520]  continues from this new label pointer.
[19:10.320 --> 19:12.140]  So, as you can see,
[19:12.140 --> 19:14.860]  there is no bound checks on the packet buffer here,
[19:14.860 --> 19:18.840]  so this led us to the first vulnerability, a read-out-of-bounds vulnerability.
[19:19.140 --> 19:21.600]  This could result in a denial of service
[19:21.600 --> 19:24.720]  if, for example, while iterating over the length byte,
[19:24.720 --> 19:27.040]  we read from an unmapped page.
[19:27.240 --> 19:30.240]  But more interestingly, we can cause
[19:30.240 --> 19:32.400]  an information leakage vulnerability.
[19:32.620 --> 19:36.660]  So, tf.dns.label.to.ascii, the function that does the copies,
[19:36.660 --> 19:38.700]  has no bound checks either.
[19:38.700 --> 19:42.640]  This means that data from the heap could be interpreted as an MX hostname.
[19:42.640 --> 19:45.480]  This MX hostname is later resolved by the client
[19:45.480 --> 19:48.380]  in an attempt to get an IP address.
[19:48.420 --> 19:51.040]  So, this means that we can leak data from the heap
[19:51.040 --> 19:53.920]  inside the MX hostname itself in the query.
[19:54.320 --> 19:58.480]  This vulnerability affects track version at least 4.7,
[19:58.480 --> 20:00.520]  and it was fixed in later versions,
[20:00.520 --> 20:01.820]  as we will see.
[20:02.200 --> 20:04.840]  We don't know the exact version of the fix,
[20:04.840 --> 20:09.600]  but nonetheless, the vulnerability still affects devices in the wild
[20:10.340 --> 20:12.980]  due to the complex supply chain effect
[20:12.980 --> 20:16.240]  and the nature of the embedded devices,
[20:16.240 --> 20:19.040]  so some don't receive updates.
[20:19.560 --> 20:21.100]  So, this is nice and all,
[20:21.100 --> 20:22.840]  but we are looking for an RCE,
[20:22.840 --> 20:25.780]  so let's go back to the function that computes the length
[20:25.780 --> 20:28.100]  and examine its operator further.
[20:28.100 --> 20:30.780]  So, there are more issues with expandLabelLength.
[20:30.780 --> 20:33.900]  So, per the RFC,
[20:33.900 --> 20:38.680]  there is a limitation on the maximum domain name length.
[20:38.680 --> 20:41.280]  The limitation is 255 characters,
[20:41.280 --> 20:44.860]  and this limitation is not enforced in expandLabelLength.
[20:44.860 --> 20:46.400]  Further, it does not validate
[20:46.400 --> 20:48.780]  the characters of the domain name.
[20:48.780 --> 20:51.840]  They should be all alphanumeric and hyphen,
[20:51.840 --> 20:53.600]  but it doesn't validate it,
[20:53.600 --> 20:57.600]  so we can embed bytes within the name itself.
[20:57.600 --> 20:59.380]  And, last but not least,
[20:59.380 --> 21:02.740]  the totalLength variable is stored as an unsigned short,
[21:02.740 --> 21:04.320]  16 bits width,
[21:04.320 --> 21:09.000]  and recall that the totalLength is the variable that is returned
[21:09.420 --> 21:11.980]  from the expandLabelLength.
[21:11.980 --> 21:14.000]  So, what we try to do is
[21:14.000 --> 21:19.080]  we try to get our RCE by overflowing the totalLength variable.
[21:19.600 --> 21:21.840]  So, what we need in order to pull this thing off
[21:21.840 --> 21:23.680]  is to construct a name
[21:23.680 --> 21:27.020]  whose length is larger than 64k.
[21:27.020 --> 21:30.160]  So, we ask ourselves, is it really possible?
[21:30.160 --> 21:31.960]  This is not trivial,
[21:31.960 --> 21:34.660]  this is possible over UDP.
[21:34.740 --> 21:37.840]  So, can we overflow the totalLength variable
[21:37.840 --> 21:40.940]  within a single DNS response packet?
[21:41.000 --> 21:44.260]  The answer is yes, and we use the DNS compression feature
[21:44.260 --> 21:46.240]  to achieve this.
[21:46.440 --> 21:50.100]  The idea was to nest compression pointers within themselves,
[21:50.100 --> 21:53.480]  so recall that expandLabelLength does not validate the bytes
[21:53.480 --> 21:56.440]  of the domain name itself.
[21:56.440 --> 21:58.880]  This means we can embed there any bytes we want,
[21:58.880 --> 22:02.020]  and in this case we chose to embed compression pointers.
[22:02.020 --> 22:04.480]  You will see in the example shortly.
[22:04.480 --> 22:08.320]  Keep in mind during the example that we have two challenges to overcome.
[22:08.620 --> 22:12.140]  First, there is a limitation on the DNS response packet size.
[22:12.140 --> 22:14.360]  The maximum size allowed over UDP
[22:14.360 --> 22:19.480]  by the network stack is 1460 bytes.
[22:19.480 --> 22:22.360]  And keep in mind that we can only jump backwards
[22:22.360 --> 22:24.100]  from a current label pointer,
[22:24.100 --> 22:27.040]  so we will need to overcome this challenge also.
[22:28.600 --> 22:32.040]  Here you can see on the slide the basic construction we used.
[22:32.040 --> 22:35.000]  This is a name, arranged in a matrix-like form.
[22:35.000 --> 22:38.120]  Each row in this matrix has length 16.
[22:38.360 --> 22:42.260]  The bluish cells represent compression pointers,
[22:42.260 --> 22:44.740]  and the pink cells represent branch bytes.
[22:44.740 --> 22:46.960]  We will talk about those shortly.
[22:48.100 --> 22:51.540]  Let's assume we start expanding the name from this offset,
[22:51.540 --> 22:54.500]  offset 0xf in the first row.
[22:54.640 --> 22:56.520]  We can actually achieve this in practice
[22:56.520 --> 22:58.220]  by using another compression pointer
[22:58.220 --> 23:01.220]  that will land us exactly in that spot,
[23:01.220 --> 23:03.160]  but for now take it as a given.
[23:03.300 --> 23:05.920]  If we start expanding from this offset,
[23:05.920 --> 23:08.800]  this byte is interpreted as a length byte.
[23:09.000 --> 23:10.740]  In this case there is no compression,
[23:10.740 --> 23:14.840]  so the function skips 0xf plus 1 bytes
[23:14.840 --> 23:17.160]  and moves to the next row,
[23:17.160 --> 23:22.220]  as well as adding 16 to the total length variable.
[23:22.220 --> 23:23.700]  The process continues.
[23:23.700 --> 23:25.600]  Notice that we stay in the same column
[23:25.600 --> 23:31.840]  because of the special matrix shape
[23:31.840 --> 23:34.660]  until we reach branch bytes.
[23:34.660 --> 23:36.520]  The purpose of the branch byte
[23:36.520 --> 23:39.480]  is simply to lead us to the next compression pointer.
[23:39.480 --> 23:42.060]  In this case the branch byte is 0e,
[23:42.060 --> 23:44.440]  so we land in this compression pointer.
[23:44.620 --> 23:47.080]  We know that this is a compression pointer
[23:47.080 --> 23:49.100]  because the high nibble is c.
[23:49.100 --> 23:54.200]  What the function does is reads the compression offset.
[23:54.480 --> 23:57.320]  The compression offset in this case is 0e.
[23:57.320 --> 23:59.560]  It then checks to see that we point backwards
[23:59.560 --> 24:01.460]  from our initial position.
[24:01.460 --> 24:06.060]  Our initial position was 0xf, so 0xe is less than 0xf.
[24:06.060 --> 24:08.600]  We continue expanding the name from here.
[24:08.620 --> 24:10.160]  The process continues.
[24:10.620 --> 24:14.260]  We reach a branch byte and another compression pointer.
[24:14.500 --> 24:17.720]  You can see that with this compression trick
[24:17.720 --> 24:20.920]  the total length variable nearly doubled itself.
[24:21.040 --> 24:24.020]  If we continue until we reach a null byte,
[24:24.020 --> 24:26.840]  in this toy example we reach a total length value
[24:26.840 --> 24:30.580]  of 1500 bytes, which is pretty neat
[24:30.580 --> 24:32.160]  if you consider that the name itself
[24:32.160 --> 24:35.920]  only occupies 128 bytes.
[24:36.160 --> 24:38.270]  Of course, this doesn't overflow
[24:39.980 --> 24:42.960]  the total length variable yet,
[24:42.960 --> 24:44.600]  but in order to overflow it
[24:44.600 --> 24:46.600]  we use the maximum label length allowed,
[24:48.340 --> 24:51.920]  63x3f, instead of xf shown in the example.
[24:51.920 --> 24:53.500]  Using exactly this construction
[24:53.500 --> 24:57.200]  we reach a name of length 64k
[24:57.980 --> 25:00.240]  greater than 64k, thus overflowing
[25:00.240 --> 25:02.580]  the total length variable itself.
[25:02.580 --> 25:05.580]  And remember that if expandLabelLength returns a value
[25:05.580 --> 25:08.560]  which is too small, we have an heap-based buffer overflow
[25:08.930 --> 25:11.640]  which is a good RCE candidate.
[25:12.500 --> 25:15.340]  Also know that this vulnerability can be triggered
[25:15.340 --> 25:18.500]  in response to every query type supported by the network stack
[25:18.500 --> 25:21.060]  by using CNAME resource records
[25:21.060 --> 25:24.140]  which must be parsed regardless of the
[25:24.140 --> 25:26.200]  original query type.
[25:26.200 --> 25:28.840]  So this vulnerability affects the latest track version
[25:28.840 --> 25:32.000]  at the time of disclosure, and that's considered
[25:32.000 --> 25:35.040]  dangerous from our point of view.
[25:35.040 --> 25:38.600]  So at this point, we decide to purchase
[25:38.600 --> 25:40.920]  another device that runs track
[25:40.920 --> 25:44.260]  in this case Schneider UPS device, and we want to
[25:44.260 --> 25:47.520]  know if the vulnerability affects him or not.
[25:47.520 --> 25:50.180]  And what we found is that track fixes the read-out-of-bank
[25:50.180 --> 25:52.700]  vulnerability, but they fix it badly.
[25:52.700 --> 25:54.360]  So you can see that the RDLength
[25:55.420 --> 25:58.340]  value from the resource record itself checks again
[25:58.340 --> 26:00.980]  the remaining size of the packet.
[26:00.980 --> 26:04.280]  And now after the fix, expandLabelLength accepts
[26:04.560 --> 26:06.500]  a third argument, labelEndPtr
[26:06.500 --> 26:11.640]  which is computed based on the RDLength value.
[26:11.640 --> 26:13.780]  And what expandLabelLength does when it reaches
[26:13.780 --> 26:17.000]  this endpoint, it simply stops processing
[26:17.000 --> 26:20.920]  without any error and returns the current total length.
[26:20.920 --> 26:23.540]  So this is perfect from an attacker's standpoint
[26:23.540 --> 26:26.600]  because RDLength is attacker-controlled.
[26:26.600 --> 26:29.660]  So if, for example, we specify an RDLength value which is
[26:29.660 --> 26:33.080]  too small than the actual value, we will have
[26:33.080 --> 26:35.280]  an AWS buffer overflow.
[26:35.580 --> 26:38.680]  So here you can see a resource record, and instead of specifying
[26:38.680 --> 26:42.180]  an RDLength value of 20, we will specify 7.
[26:42.440 --> 26:45.160]  So the labelEndPtr will point here
[26:45.980 --> 26:47.940]  expandLabelLength returns 5 in this case
[26:47.940 --> 26:51.460]  but tf.dnsLabelToAraski will copy the entire
[26:51.460 --> 26:54.160]  MXSourceName, thus overflowing our buffer.
[26:54.940 --> 26:57.920]  So this is the second heap-based
[26:57.920 --> 27:00.300]  buffer overflow vulnerability we found.
[27:00.300 --> 27:03.440]  And we also found during the MXParsing logic
[27:04.200 --> 27:05.900]  a memory leak. We found that we can leak
[27:05.900 --> 27:08.600]  another infrastructure. So here you can see
[27:08.600 --> 27:11.760]  another infrastructure is allocated and
[27:11.760 --> 27:15.280]  in these two error flows, the other info is not
[27:15.280 --> 27:18.220]  freed. So this means we can leak another
[27:18.220 --> 27:21.140]  infrastructure by specifying an RDLength value which is
[27:21.140 --> 27:24.260]  strictly less than 2, like 1, or by causing
[27:24.260 --> 27:27.480]  expandLabelLength to return a 0 length
[27:27.480 --> 27:31.000]  for example by using a bad compression pointer.
[27:31.360 --> 27:33.180]  So the size of the leak is
[27:33.180 --> 27:35.900]  0x3c and these
[27:36.620 --> 27:39.360]  artifacts come in handy when exploiting vulnerabilities
[27:39.360 --> 27:42.520]  and in fact we use the exact same memory
[27:42.520 --> 27:45.960]  leak in our exploit, as you will see later.
[27:46.900 --> 27:48.380]  So to summarize, we saw
[27:48.380 --> 27:51.100]  three vulnerabilities that comprise CVE
[27:51.100 --> 27:55.540]  2020.11.9.1 and an artifact.
[27:55.540 --> 27:57.880]  So the first vulnerability, the read-out-of-bounds
[27:57.880 --> 28:00.640]  affects other versions of the network stack
[28:00.640 --> 28:03.500]  and was fixed in later versions. The integer overflow
[28:03.500 --> 28:06.040]  vulnerabilities, as far as we've seen, affects
[28:06.040 --> 28:09.900]  both old and versions of the TrackTCP IP stack
[28:09.900 --> 28:12.460]  and the bad RDLength vulnerability is
[28:13.080 --> 28:15.980]  a result of a bad fix for the read-out-of-bounds vulnerability
[28:15.980 --> 28:18.940]  and thus affects only newer versions of the stack.
[28:18.940 --> 28:21.880]  The artifact is present in both old and
[28:21.880 --> 28:24.720]  newer versions of the stack. And the main
[28:24.720 --> 28:27.820]  takeaway from this part is that a device can be affected by
[28:27.820 --> 28:30.940]  one or more vulnerabilities depending on the exact version
[28:30.940 --> 28:33.460]  of the track they're using and this
[28:33.460 --> 28:36.820]  fragmentation makes the life of the IT security
[28:36.820 --> 28:39.800]  personnel more challenging to know whether the devices are
[28:39.800 --> 28:43.340]  affected or not. So now I'll hand over the
[28:43.340 --> 28:45.880]  mic to Ariel Schon. He will talk about
[28:45.880 --> 28:47.080]  exploitation.
[28:48.040 --> 28:51.540]  Thanks, Moshe. Hi, I'm Ariel. I'm also a security researcher
[28:51.540 --> 28:54.340]  at JSOF and today I'm going to talk about exploiting
[28:54.340 --> 28:58.220]  the vulnerability, the CVE, on a Schneider
[28:58.220 --> 28:59.980]  Electric UPS device.
[29:01.880 --> 29:04.480]  So a UPS device
[29:05.220 --> 29:07.380]  is basically a big battery.
[29:07.380 --> 29:09.460]  UPS stands for uninterruptible power supply
[29:10.300 --> 29:13.340]  and you connect devices to it
[29:13.340 --> 29:15.860]  instead of directly to the wall, to the outlet
[29:16.480 --> 29:19.100]  to protect them from power outages or
[29:19.100 --> 29:21.380]  power fluctuations of any sort.
[29:22.220 --> 29:25.340]  So we're going to exploit on a UPS made by Schneider Electric
[29:25.340 --> 29:27.580]  specifically on the network card.
[29:28.240 --> 29:31.520]  This network card houses a Turbo 186
[29:31.520 --> 29:34.200]  processor. It's an x86 based processor.
[29:34.320 --> 29:36.440]  All code runs in 16-bit real mode
[29:37.580 --> 29:40.440]  meaning also there are no modern mitigations at all
[29:40.440 --> 29:42.280]  so no depth, no SLR.
[29:42.580 --> 29:46.040]  This processor is x86 based. It's not strictly
[29:47.060 --> 29:49.160]  x86 as seen in this processor
[29:49.160 --> 29:52.280]  as a weird segmentation scheme. Instead of shifting
[29:52.280 --> 29:55.240]  the segment register by 4 bits like
[29:55.240 --> 29:58.160]  on x86, it does it by 8 bits and we'll see this
[29:58.160 --> 30:00.480]  feature come into play later.
[30:01.560 --> 30:04.260]  During this research, we had essentially
[30:04.260 --> 30:05.640]  no debugging capabilities
[30:07.080 --> 30:10.420]  so no JTAG, no GDB, nothing of the sort
[30:10.420 --> 30:13.260]  so we relied mainly on static analysis
[30:13.260 --> 30:16.400]  and reverse engineering using only limited
[30:16.400 --> 30:19.380]  crash dumps as assistance.
[30:19.400 --> 30:21.460]  These crash dumps as visible feature
[30:22.120 --> 30:26.000]  a basic stack trace and some registers but nothing much.
[30:26.000 --> 30:28.300]  So just to recap the vulnerability
[30:28.800 --> 30:32.140]  our primitive is a heap overflow through DNS response parsing
[30:32.140 --> 30:34.940]  and this is a rather new track
[30:34.940 --> 30:38.080]  implementation so we can only overflow
[30:38.080 --> 30:40.500]  with alphanumeric characters and hyphens
[30:40.500 --> 30:42.460]  and periods.
[30:43.140 --> 30:46.600]  And we're going to use the bad rdelect vulnerability
[30:46.600 --> 30:50.500]  variant that Moshe talked about earlier.
[30:51.260 --> 30:53.100]  So when exploiting heap
[30:53.100 --> 30:55.380]  overflows, generally there are two methods
[30:55.380 --> 30:57.320]  either through metadata corruption
[30:57.320 --> 31:01.420]  so overflowing free list pointers, block sizes
[31:02.100 --> 31:03.220]  stuff of the sort
[31:03.220 --> 31:06.800]  or by overflowing application specific data structures
[31:06.800 --> 31:09.140]  allocated on the heap.
[31:09.140 --> 31:13.100]  Generally metadata is considered a more generic exploitation method
[31:13.100 --> 31:15.140]  as it doesn't rely on extensive shaping
[31:15.140 --> 31:18.580]  and this is the exploitation method we used
[31:19.380 --> 31:22.440]  in the earlier CVE on the DigiConnect device
[31:23.240 --> 31:26.000]  that we have a white paper about this
[31:26.000 --> 31:27.660]  and you can find on our website
[31:28.700 --> 31:31.180]  and we wondered if we can use the same
[31:31.180 --> 31:33.480]  technique on this device as well.
[31:34.460 --> 31:37.440]  So the track heap as implemented on the Schneider Electric QPS
[31:37.440 --> 31:39.640]  is slightly different.
[31:39.660 --> 31:42.060]  We have a free list that looks like this
[31:42.060 --> 31:45.820]  it has a size field, next pointer to the next free list block
[31:45.820 --> 31:49.340]  some free data usually containing garbage
[31:49.340 --> 31:50.620]  if it's a free block
[31:50.620 --> 31:53.400]  and another post size field.
[31:54.340 --> 31:57.020]  So this heap features a tight fit preference
[31:57.020 --> 32:00.980]  it will always allocate the smallest free block available for the allocation size requested
[32:00.980 --> 32:03.440]  it also features free block coalescing
[32:03.440 --> 32:06.260]  so there are never two adjacent free list blocks
[32:07.040 --> 32:09.960]  and it has all sorts of verifications
[32:09.960 --> 32:12.620]  and asserts that we didn't have in the previous
[32:12.620 --> 32:14.940]  implementation on the Digi device
[32:15.420 --> 32:19.020]  so for example on every heap operation the entire free list is checked
[32:19.820 --> 32:21.560]  one of the checks is that the
[32:21.560 --> 32:24.600]  first size field and the last size field
[32:24.600 --> 32:26.040]  really do match
[32:27.480 --> 32:29.920]  allocated blocks however are only checked when free
[32:30.400 --> 32:31.600]  so this is a bit easier
[32:33.860 --> 32:37.460]  corrupting the heap in a way that will not cause a premature crash
[32:37.460 --> 32:39.800]  due to these checks on heap operations
[32:39.800 --> 32:43.200]  Using only alphanumeric characters is rather hard,
[32:43.200 --> 32:45.180]  so we chose to go in a different way
[32:45.180 --> 32:48.340]  by overflowing data structures this time.
[32:50.980 --> 32:55.120]  So we know we can overflow through all DNS response types,
[32:55.120 --> 32:58.660]  however we chose to overflow through MX requests specifically
[32:58.660 --> 33:00.880]  because when the device boots up,
[33:00.880 --> 33:03.280]  it will send out three MX requests.
[33:04.660 --> 33:08.800]  So that is a good exploitation primitive.
[33:09.800 --> 33:12.460]  Also, three requests is very good for us
[33:12.460 --> 33:14.580]  for the heap exploit,
[33:14.580 --> 33:17.600]  since interactivity is always advantageous.
[33:17.600 --> 33:19.320]  It allows us a bit more flexibility
[33:19.320 --> 33:22.340]  in shaping our heap exactly the way we want.
[33:22.860 --> 33:24.500]  And we don't really mind
[33:24.500 --> 33:26.720]  that this happens only on device boot,
[33:26.720 --> 33:29.920]  as we probably would have to crash the device anyway.
[33:29.920 --> 33:32.760]  As said, we had limited debugging capabilities
[33:32.760 --> 33:34.040]  during this research,
[33:34.040 --> 33:37.440]  so not much insight into the heap and how it is shaped.
[33:37.440 --> 33:39.560]  So we would like to get the heap into a state
[33:40.070 --> 33:42.820]  that is as deterministic as possible.
[33:43.600 --> 33:46.180]  So crashing is good in this manner.
[33:46.260 --> 33:51.400]  And crashing the UPS network card has a very low penalty,
[33:51.400 --> 33:54.640]  as it doesn't affect the actual UPS operation in any way.
[33:54.640 --> 33:57.760]  It doesn't affect the power supply.
[33:57.760 --> 34:00.300]  And the network card will automatically boot right up
[34:00.300 --> 34:04.240]  after a few seconds, so the penalty is very low.
[34:05.360 --> 34:07.680]  So overflowing data structures.
[34:07.680 --> 34:11.420]  We chose to overflow a structure called tsDNSCacheEntry.
[34:11.420 --> 34:13.660]  This structure holds information
[34:13.660 --> 34:16.620]  about the DNS request-response pair.
[34:16.780 --> 34:19.760]  It has all sorts of interesting fields.
[34:19.760 --> 34:22.480]  For example, the dnseAdderInfoPtr field
[34:22.480 --> 34:25.180]  that holds a list of AdderInfo structs.
[34:25.180 --> 34:29.540]  These structs hold information about a certain DNS response.
[34:29.560 --> 34:31.620]  So for example, if you resolve the name,
[34:31.620 --> 34:33.460]  this structure has a field that will hold a pointer
[34:33.860 --> 34:35.820]  to the name that was resolved.
[34:37.220 --> 34:39.720]  Other than that, it has other pointers
[34:39.720 --> 34:41.720]  and interesting fields, such as,
[34:41.720 --> 34:43.960]  it's a doubly-linked list, as you can see,
[34:43.960 --> 34:45.900]  so it has an X and previous entry pointers.
[34:46.000 --> 34:47.240]  And these are always interesting
[34:47.240 --> 34:49.020]  from an exploiter point of view.
[34:49.900 --> 34:51.420]  And this structure is referenced often
[34:51.420 --> 34:53.040]  in DNS response parsing,
[34:53.040 --> 34:54.980]  which is a logic we can easily trigger.
[34:55.060 --> 34:57.460]  So this was a natural candidate for us.
[34:57.760 --> 35:00.100]  So assuming we can overflow this structure,
[35:00.100 --> 35:01.460]  what can we do with it?
[35:02.170 --> 35:07.440]  This is a pseudocode snippet from the parsing logic.
[35:07.820 --> 35:11.540]  It specifically shows how CNAME records are parsed.
[35:11.680 --> 35:14.080]  So we can see that, first thing,
[35:14.200 --> 35:16.740]  a pointer is taken from the DNS cache entry
[35:17.460 --> 35:19.400]  into a stack variable.
[35:19.400 --> 35:22.180]  Then the CNAME is allocated on the heap
[35:22.180 --> 35:23.580]  and the data is copied.
[35:23.580 --> 35:25.040]  This is, of course, data we control
[35:25.040 --> 35:27.320]  as we provide the DNS response.
[35:27.480 --> 35:30.620]  And subsequently, the pointer to the new CNAME
[35:30.620 --> 35:32.280]  allocated on the heap is placed
[35:34.140 --> 35:37.560]  into what the stack variable points to.
[35:37.600 --> 35:42.420]  Again, this pointer was taken from the DNS cache entry.
[35:42.540 --> 35:44.420]  So if we overflow the cache entry,
[35:44.420 --> 35:46.560]  we can control to where this heap pointer
[35:46.560 --> 35:49.400]  to our CNAME is written to memory.
[35:49.680 --> 35:51.920]  This is a control pointer, right, essentially?
[35:51.920 --> 35:53.440]  We can write a four-byte pointer
[35:53.440 --> 35:55.580]  to data we control on the heap.
[35:55.580 --> 35:58.200]  In x86 16-bit, this is a two-byte offset
[35:58.200 --> 35:59.440]  and two-byte segment.
[35:59.920 --> 36:01.940]  We can write this to any alphanumeric address
[36:01.940 --> 36:04.320]  as our overflow is alphanumeric.
[36:04.380 --> 36:06.980]  This is a relatively strong exploitation primitive.
[36:06.980 --> 36:09.300]  Writing data into places you're not supposed to write into
[36:09.800 --> 36:12.240]  is always good when you try to exploit.
[36:12.260 --> 36:15.720]  So this is the primitive we chose to continue with.
[36:16.720 --> 36:19.480]  Our overflow is a simple heap overflow,
[36:19.480 --> 36:21.740]  meaning it is from the end of our buffer
[36:22.540 --> 36:26.020]  with no offset into the next,
[36:26.020 --> 36:28.620]  what lies in the heap next.
[36:28.620 --> 36:31.280]  So we would like, naturally,
[36:31.280 --> 36:34.540]  the cache entry to be placed after the MXNAME buffer.
[36:34.640 --> 36:36.620]  Because of the heap structure
[36:36.620 --> 36:39.660]  in this specific trick implementation,
[36:39.660 --> 36:41.140]  we have all sorts of limitations.
[36:41.140 --> 36:44.260]  For example, the cache entry is allocated
[36:44.260 --> 36:46.400]  on request creation.
[36:46.420 --> 36:48.060]  However, the MXNAME buffer is allocated
[36:48.060 --> 36:50.080]  on DNS response parsing.
[36:50.080 --> 36:52.060]  So chronologically, it is allocated later
[36:52.060 --> 36:54.520]  when we want it on the heap to be allocated before
[36:54.520 --> 36:56.980]  so we can overflow into the cache entry.
[36:56.980 --> 37:00.460]  Also, because free blocks are checked very often
[37:00.460 --> 37:03.360]  in this heap implementation for corruption,
[37:03.360 --> 37:05.880]  we cannot write free data.
[37:05.880 --> 37:08.400]  We must overwrite only allocated data.
[37:08.400 --> 37:10.400]  It will be best if we can overflow directly
[37:10.400 --> 37:12.020]  into the structure without corrupting
[37:12.020 --> 37:13.760]  any heap data in the way.
[37:14.440 --> 37:16.700]  So we need to shape the heap in some way
[37:16.700 --> 37:18.380]  to get this to happen.
[37:19.260 --> 37:23.140]  A specific call pattern is preferable
[37:23.140 --> 37:26.400]  as we would like to overcome the chronological problem.
[37:26.400 --> 37:28.840]  We can do this because of type fit preference.
[37:28.840 --> 37:31.880]  So if we create, for example, hole two,
[37:31.880 --> 37:33.820]  as seen in this diagram,
[37:33.820 --> 37:37.220]  this is the size of the DNS cache entry structure.
[37:37.800 --> 37:39.640]  And before it, we have another hole
[37:39.640 --> 37:42.320]  the size of the MXNAME buffer we're going to allocate.
[37:42.580 --> 37:44.700]  We can overflow from the MXNAME buffer
[37:44.700 --> 37:45.980]  into the cache entry.
[37:46.220 --> 37:49.120]  We do need to separate them with some allocated separator
[37:49.820 --> 37:51.740]  to prevent the two free blocks
[37:51.740 --> 37:54.760]  from being coalesced together and ruining our shape.
[37:55.280 --> 37:57.840]  So we need some allocation primitives to create this.
[37:57.840 --> 37:59.840]  We need an allocation to create holes
[37:59.840 --> 38:01.900]  and an allocation to create the separator.
[38:02.940 --> 38:03.980]  So to create the holes,
[38:03.980 --> 38:06.960]  we can use a temporary allocation primitive.
[38:07.580 --> 38:09.480]  And this is relatively easy to achieve
[38:09.480 --> 38:14.320]  as every DNS answer has all sorts of names in it
[38:14.320 --> 38:16.260]  or we can cause it to have names in it.
[38:16.260 --> 38:18.360]  So either MX or PDR or CNAMES,
[38:18.360 --> 38:19.460]  they all have name fields.
[38:19.460 --> 38:21.740]  And these names cause allocation on the heap.
[38:21.740 --> 38:24.440]  So the allocation is the size of the name we provide
[38:24.940 --> 38:27.880]  and its data is, of course, alphanumeric and controlled.
[38:28.600 --> 38:30.620]  This allocation will also cause
[38:30.960 --> 38:33.420]  a small other infrastructure to be allocated as well.
[38:33.420 --> 38:35.600]  We need to take this into account when shaping.
[38:36.460 --> 38:38.960]  This allocation is freed when DNS parsing fails
[38:38.960 --> 38:42.320]  or when the record TTL expires.
[38:42.900 --> 38:44.980]  So this allocation is perfect
[38:44.980 --> 38:47.740]  for creating free regions of arbitrary size,
[38:47.740 --> 38:49.180]  which is what we need.
[38:49.180 --> 38:51.640]  However, we need to separate our free regions
[38:51.640 --> 38:55.560]  and we can do this using the AdderInfo memory leak,
[38:55.560 --> 38:58.460]  the memory leak artifact Moshe talked about earlier.
[38:58.760 --> 39:00.960]  And just to recap shortly,
[39:00.960 --> 39:04.200]  we can see that an AdderInfo structure is allocated
[39:04.200 --> 39:08.300]  and only then the record validity is checked.
[39:08.300 --> 39:09.340]  And if it's not valid,
[39:09.340 --> 39:11.480]  we will exit through an error flow
[39:12.010 --> 39:14.000]  and that will not free the allocation.
[39:14.000 --> 39:16.880]  So we basically have a memory leak of a known size
[39:16.880 --> 39:18.560]  and we can use this as a separator
[39:18.560 --> 39:20.800]  between our two free buffers.
[39:21.780 --> 39:23.420]  So we managed to shape the heap
[39:23.960 --> 39:26.080]  into the specific shape we want.
[39:26.080 --> 39:28.140]  And this allows us to reliably overflow
[39:28.140 --> 39:31.260]  the DNS cache entry every time.
[39:32.360 --> 39:34.420]  So assuming we can do this,
[39:34.420 --> 39:36.040]  and we just showed we can,
[39:36.040 --> 39:40.120]  what can we do with the CNAME pointer override primitive?
[39:40.460 --> 39:41.880]  What can we override?
[39:42.380 --> 39:45.900]  So this pointer write has all sorts of limitations.
[39:46.100 --> 39:48.500]  It's written to an address we overflowed
[39:48.500 --> 39:49.780]  into the cache entry,
[39:49.780 --> 39:52.600]  so naturally only alphanumeric addresses are allowed.
[39:52.660 --> 39:55.500]  However, this is a string overflow vulnerability.
[39:55.500 --> 39:57.440]  After all, we're overflowing with a name,
[39:57.440 --> 39:59.200]  so we do have a trailing null byte.
[39:59.400 --> 40:03.000]  In x86, little-endian architecture,
[40:03.000 --> 40:03.760]  this trailing null byte
[40:03.760 --> 40:06.420]  can be the most significant byte of the segment,
[40:06.420 --> 40:07.940]  allowing us a bit more flexibility
[40:07.940 --> 40:09.840]  in the addresses we can reach.
[40:11.160 --> 40:14.860]  However, nothing in this specific firmware
[40:14.860 --> 40:18.120]  is placed in a strictly alphanumeric memory address,
[40:18.120 --> 40:21.480]  so no code, no heap, no stack, no globals,
[40:21.480 --> 40:23.520]  nothing interesting we can override.
[40:24.460 --> 40:27.460]  Luckily, due to the weird segmentation
[40:27.460 --> 40:30.140]  of this Turbo 186 processor,
[40:30.140 --> 40:31.960]  we can use a little trick
[40:31.960 --> 40:34.860]  and easily combine two alphanumeric bytes
[40:34.860 --> 40:37.500]  to reach a non-alphanumeric byte
[40:37.500 --> 40:38.840]  or a non-alphanumeric segment,
[40:38.840 --> 40:40.360]  and this will look something like this.
[40:40.360 --> 40:43.260]  For example, we choose the segment to be null byte 4b,
[40:43.260 --> 40:44.500]  which is allowed.
[40:44.800 --> 40:46.400]  It will then be shifted by eight bytes
[40:46.400 --> 40:49.700]  and an alphanumeric offset will be added to it,
[40:49.700 --> 40:52.040]  resulting in a linear address
[40:52.040 --> 40:54.700]  containing a non-alphanumeric byte,
[40:54.700 --> 40:58.080]  which corresponds directly to a non-alphanumeric segment.
[40:58.160 --> 40:59.920]  So using this technique
[40:59.920 --> 41:02.280]  due to the weird segmentation feature,
[41:02.680 --> 41:06.060]  we can reach the heap utility code section,
[41:06.060 --> 41:07.700]  which has all sorts of functions,
[41:07.700 --> 41:11.000]  such as free and malloc, which we can overflow.
[41:11.000 --> 41:13.320]  Again, this is a 16-bit real mode,
[41:13.320 --> 41:15.720]  so we don't have DEP or ASLR,
[41:15.720 --> 41:18.140]  allowing us to overwrite code with our primitive.
[41:19.080 --> 41:23.580]  So when overwriting code in x86,
[41:23.580 --> 41:24.760]  one of the interesting destinations
[41:24.760 --> 41:27.240]  to overwrite with a pointer is a far call opcode,
[41:27.240 --> 41:29.220]  as a far call has an absolute address
[41:29.220 --> 41:31.420]  encoded directly in it.
[41:31.480 --> 41:34.220]  So if we can overwrite this destination address
[41:34.220 --> 41:38.580]  with our pointer, we can cause the execution flow
[41:38.580 --> 41:40.980]  to move into our CNAME buffer.
[41:41.100 --> 41:43.720]  So we can patch the far call using our primitive
[41:43.720 --> 41:46.840]  and execute our controlled payload.
[41:47.560 --> 41:50.300]  So we do this exactly, we patch a far call
[41:50.300 --> 41:52.720]  in the free air flow, which is called
[41:52.720 --> 41:54.840]  when metadata corruption is detected.
[41:54.880 --> 41:57.720]  Naturally, we do corrupt the heap when overflowing,
[41:57.720 --> 41:59.280]  so this air flow will take place
[41:59.280 --> 42:01.700]  when our allocated blocks are freed.
[42:02.960 --> 42:06.040]  So let's shortly recap what we did here.
[42:06.040 --> 42:08.060]  First, we shaped the heap in order to get
[42:08.060 --> 42:10.040]  the MXNAME buffer to be directly before
[42:10.040 --> 42:12.720]  the DNS cache entry with a separator in between
[42:12.720 --> 42:14.160]  that's not shown here.
[42:14.380 --> 42:17.540]  We then overflow to write the DNAC other info PTR
[42:17.540 --> 42:21.800]  to point into a far call in the free function.
[42:22.560 --> 42:25.480]  We then process a CNAME record,
[42:25.480 --> 42:27.540]  the parsing logic will process a CNAME record
[42:27.540 --> 42:31.020]  containing some evil alphanumeric payload,
[42:31.020 --> 42:33.380]  which will be allocated on the heap
[42:33.380 --> 42:36.040]  and a pointer to this payload will be placed
[42:36.040 --> 42:40.400]  into the address we put in the DNAC other info PTR,
[42:40.400 --> 42:42.320]  overwriting a far call destination,
[42:42.320 --> 42:44.260]  directing it to our evil payload,
[42:44.260 --> 42:46.660]  which can be, for example, some alphanumeric shellcode.
[42:47.760 --> 42:51.840]  Triggering this payload is rather easy,
[42:51.840 --> 42:53.100]  the free air flow will be triggered
[42:53.580 --> 42:58.120]  when the MXNAME record we overflowed from is free.
[42:58.860 --> 43:02.660]  And the CNAME buffer, specifically what we overflowed with,
[43:02.660 --> 43:05.720]  contains a two-stage alphanumeric shellcode,
[43:05.720 --> 43:09.420]  which will decode itself and allow us
[43:09.420 --> 43:12.480]  to run essentially arbitrary payloads,
[43:12.480 --> 43:15.040]  achieving arbitrary payload execution.
[43:15.400 --> 43:17.760]  Specifically, our payload just turns off power
[43:17.760 --> 43:21.000]  to all UPS outlets, turning off any critical device
[43:21.000 --> 43:22.840]  that was supposed to stay on.
[43:23.000 --> 43:26.840]  And we will now see a short demo of this payload execution.
[44:02.930 --> 44:05.590]  Thanks for listening and tuning into our presentation.
[44:05.590 --> 44:07.950]  We'll be glad if you join us in the Q&A.
